.\"/*
.\" * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
.\" * See https://llvm.org/LICENSE.txt for license information.
.\" * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
.\" *
.\" */
.NS 4 "Parser"
.lp
The Parser (\c
.i parser.c )
performs an LR(1) parse of the input source
using parse tables generated
by the Parse Table Generator utility
and tokens returned by the Scanner.
For a description of the LR parsing method used, see
references [6] and [7].
.lp
Refer to Appendix I for a listing of the grammar
which drives the Parser.
.sh 2 "Data Structures"
.us "Parse Stack"
contains parse states and is maintained by the Parser.
When a 'read transition' occurs, a state is added to the stack.
When a reduction occurs, the stack is reduced by the number of
tokens on the right hand side of the reduction minus one.
.us "Semantic Stack"
is maintained in parallel with the Parse Stack, and is
described in detail in section 5 since it is only used by the
Semantic Analyzer.
The variable,
.cw stktop ,
is used to point to the current top of the semantic stack.
Whenever a read transition occurs, the Parser adds to the
semantic stack, information on the current token (for example symbol table
pointer) which the Scanner passed to it.
.us "Parse Tables"
tables created by the utility PRSTAB and which control the
parsing process.
These tables are described in the PRSTAB Maintenance and Operation
Document, [7].
.sh 2 "Processing Overview"
The Parser uses parse tables generated by the Parse Table Generator
utility (PRSTAB), and tokens returned by the Scanner.
As each parse reduction is made, the Semantic Analyzer is
called to perform the function associated with the
particular reduction.
.lp
The parser maintains the parse stack.
Semantic actions are done in parallel with reductions
involving the parse stack.
Figure 4-1 gives a pseudo-code description of the operation of the
Parser.
.(z
.hl

.CS
loop for each statement in subprogram unit {

    parse_init();

    repeat  /* once for each input token in the statement */   {
	get token;
	while (a reduction can be made using the current
	       token as a look-ahead symbol)   {
	    call Semantic Analyzer for this reduction;
	    if (sem.ignore_stmt) {
		/* skip this statement */
		goto next_statement;
	    }
	    replace on the parse stack, the right hand side
		 symbols by the left hand side symbol for 
		 this reduction;

	    current_state = new value determined from
				Parse Tables;
	    stack current_state on Parse Stack;
	}

	if (there is a valid read transition)  {
	    current_state = new value determined using
		 current_state, token, and Parse Tables;
	    stack current_state information on
		 Parse Stack.
	}
	else
	    error;

	if (token == end_of_line) {
	     if end-of-program-unit return;
	     if at end-of-file { error; return; }
	     break;
	}
    }
next_statement:;
}
.CE
.sp
.ce
Figure 4-1: Parser Operation
.hl
.)z
.lp
The
.cw parse_init
routine sets the Parse Stack to its initial
state.
.cw parse_init
calls the
.i reset
routine in the lexical analyzer to begin processing tokens
from the next statement.
.lp
During semantic analysis, the semantic analyzer,
while processing reductions, may detect an error for which it's
useless to process the remaining tokens in the statement.
For example, if an executable statement appears in an interface,
there's no need to semantically analyze the statement.
The semantic analyzer can direct the parser to ignore the
remaining portion of the statement and to move on to the
next statement;
the global variable (a flag) for effecting this is
.cw sem.ignore_stmt .
.lp
The last token processed by the parser for a statement is
the
.i "end of line"
(
.cw TK_EOL
)
token.
When this token is seen, the parser determines if the
.cw END
statement is the current statement.
If it is, the parser returns to the caller (main)
since this represents the end of the current subprogram.
.sh 2 "Parse Table Generator (PRSTAB)"
.sh 3 Overview
The main function of PRSTAB is to generate data initializations
for the Parse Tables used by the Parser.
In addition, it generates three files of C
.cw #define
directives
used to coordinate the operation of the Scanner, Parser, and
Semantic Analyzer, and two files of static character information
used for error messages and debugging.
PRSTAB generates a listing and cross reference of the grammar
which is used as Appendix I of this document.
.sh 3 Inputs
PRSTAB requires two input files:
.np
Grammar Definition File.
This file defines the grammar used by \*(SC.
Reference [6] describes the format of this file;
a short example is given here:
.(b
.CS
<entry statement> ::=    <entry id>  |
			 <entry id> ( ) |
			 <entry id> ( <id list> )
 
<entry id> ::=           ENTRY <ident>
.CE
.)b
If the semantic actions (see section 5) are divided into
multiple C source files, this file contains a 
.cw .BR
directive
at the points in the grammar where the breaks occur.
.np
Token Definition File.
This file contains one line for each terminal symbol of
the grammar (token).
Each line consists of the token name as it appears in the
grammar definition file, followed by the name which will be used
for the C constant symbol representing the token id number
within \*(SC.
These names must differ in the first 8 characters, and
begin with
.cw TK_ .
Example:
.(b
.CS
<ident> TK_IDENT
CALL    TK_CALL
+       TK_PLUS
.CE
.)b
.sh 3 Outputs
There are six output files:
.np
Parse Table Definition File.
This file contains the C data definitions and initialization
code for the Parse Tables.
It is included directly into the Parser source module.
.np
Token Id Number Definition File.
This file contains a series of
.cw #define
directives for the token
id numbers.
The symbol names used are those specified in the Token Definition File.
.np
Semantic Actions Definition File.
This file contains a series of
.cw #define
directives for the
semantic action case labels (see section 5).
There is one definition for each production of the grammar.
The names for these symbols are constructed from the
name of the symbol on the left hand side of each production,
using an algorithm described in [6].
.np
Non-terminal Number Definition File.
This file contains a series of
.cw #define
directives for the grammar's non-terminal symbols.  The names for these
symbols are of the form
.cw NT_<name>
where name is constructed from the symbol's name in the grammar in an
obvious fashion.
.np
Token Name Definition File.
This file contains a statically initialized array indexed by terminal and
non-terminal number giving the name (exactly as in the grammar) of each
terminal and non-terminal symbol.  It is used for error messages and
debugging.
.np
Grammar Listing and Cross Reference File.
Lists grammar with alphabetic cross reference of grammar symbols
at end.  Used to create Appendix I of this document.
